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AMENDMENTS TO THE CLAIMS: 

1-30. (Canceled) 

31. (Currently amended) A system for identifying genes, comprising: 
a pattern database comprising patterns of amino acids; 

an input device for inputting a genomic DNA sequence; and 
a processor which is configured to : 

translate translates an open reading frame (ORF) of said DNA sequence into an 
amino acid translation; 

assign weights. Wi, to said patterns of amino acids, said weights being given by the 
equation w , - = log p ,- -log g / . where p; is a probabilitv that a pattern. matches an actual amino 
acid sequence at a fixed location, and qi is a probability that said pattern. ?/. matches an amino 
acid translation of a non-coding ORF; 

locate locates in said amino acid translation occurrences of said weighted patterns^ 
and assign a coding quality measure for said ORF based on a sum of said weighted patterns 
which are located in said amino acid translation of said ORF from said pattern database ; and 

determines whether identifv said open reading frame as including includes a 
putative gene if a value of said coding quality measure is greater than a predetermined threshold 
value based on a number of said patterns of amino acids located in said amino acid translation of 
said ORF, and/or weighted values associated with said patterns of amino acids located in said 
amino acid translation of said ORF . 

32. (Previously presented) The system according to claim 31, wherein said processor 
translates a plurality of open reading frames in said DNA sequence into amino acid translations, 
and locates in each amino acid translation occurrences of said patterns to determine whether each 
said plurality open reading frames includes a putative gene. 
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33. (Previously presented) The system according to claim 32, wherein S£iid patterns comprise 
biologically significant patterns of amino acids in amino acid sequences. 

34. (Previously presented) The system according to claim 31, wherein said processor 
identifies a match of a pattern from said pattern database in said amino acid trsinslation. 

35. (Previously presented) The system according to claim 34, wherein said patterns are 
derived from a parent database comprising at least one amino acid sequence. 

36. (Previously presented) The system according to claim 34, wherein said patterns Eire 
derived from a parent database comprising at least one amino acid sequence fragment. 

37. (Previously presented) The system according to claim 34, wherein said patterns are 
derived by using a pattern discovery algorithm. 

38. (Previously presented) The system according to claim 34, wherein said patterns are 
derived by using the Teiresias algorithm. 

39. (Previously presented) The system according to claim 34, wherein said ORF comprises a 
portion of said DNA sequence between a start codon and a stop codon. 

40. (Previously presented) The system according to claim 34, wherein said processor reports 
said ORF as a putative gene when a predetermined number of pattern matches is identified in 

said amino acid translation. 

41. (Previously presented) The system according to claim 34, wherein each pattern is 
assigned a weight depending upon a relevance of said pattern in determining whether said ORF 
comprises a putative gene. 
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42. (Currently amended) The system according to claim 34, wherein said processor is 
configured to select a start codon which results in a greatest value of said coding quality measure, 
in a case in which plural start codons match the same stop codon said QRF is reported as a 
putative gene when the sum of weights corresponding to all patterns with matches in said amino 
acid translation exceeds a predetermined threshold . 

43. (Previously presented) The system according to claim 34, wherein said match is identified 
using a predetermined pattern matching algorithm. 

44. (Previously presented) The system according to claim 34, further comprising: 

a memory device for storing data and instructions to be executed by said processor. 

45. (Previously presented) The system according to claim 34, further comprising: 
a display device for displaying an output from said processor. 

46. (Currently amended) A method of identifying genes, comprising: 
providing a pattern database comprising patterns of amino acids; 
determining an open reading frame (ORF) in a genomic DNA sequence; 
generating an amino acid translation for said ORF; 

assigning weights. Wi, to said patterns of amino acids, said weights being given by the 
equation w,- = log p,- -log q,- . where p,- is a probability that a pattern, t,-, matches an actual amino 
acid sequence at a fixed location, and gi is a probability that said pattern. ?/. matches an amino 
acid translation of a non-coding ORF; 

locating a match of said weighted patterns a pattern from said pattern database^ in said 
amino acid translation and assigning a coding quality measure for said ORF based on a sum of 
said weighted patterns which are located in said amino acid translation of said ORF ; 

identifying determining whether said ORF as including includes a putative gene if a value 
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of said coding quality measure is greater than a predetermined threshold value based on a number 
of said matching patterns of amino acids located in said amino acid translation of said ORF, 
and/or a weighted value associated with a matching pattern located in said amino acid translation 
of said ORF ; and 

displaying a result of said identifying determining whether sdd ORF as including 
includes a putative gene. 

47. (Previously presented) The method according to claim 46, wherein said pattern database 
is generated from a database comprising at least one amino acid sequence. 

48. (Previously presented) The method according to claim 46, wherein said pattern database 
is generated from a database comprising at least one amino acid sequence fragment. 

49. (Previously presented) The method according to claim 46, wherein said probability, pi , is 
calculated based on a training set further comprising: identifying said ORF as a putative gene 
when a predetermined number of pattern matches is identified in said amino acid translation . 

50. (Currently amended) The method according to claim 49 46, wherein said probability, q u 
is calculated bv computing a number of occurrences of a pattern in ORFs that are not identified 
as coding in said training set further comprising:assigning a weight to each pattern depending 
upon a relevance of said pattern in determining whether said ORF includes a putative gene . 

51. (Previously presented) The method according to claim 46, further comprising: 
displaying said match of said pattern in said amino acid translation. 

52. (Previously presented) The method according to claim 46, wherein said pattern database 
is generated using the Teiresias algorithm to derive said patterns from a parent database. 
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53. (Currently amended) A programmable storage medium tangibly embodying a program of 
machine-readable instructions executable by a digitzil processing apparatus to perform a method 
for identifying genes, said method comprising: 

providing a pattern database comprising patterns of amino acids; 

determining an open reading frames (ORF) in a given genomic DNA sequence; 

generating an amino acid translation for each ORF; 
assigning weights, w ,- , to said patterns of amino acids, said weights being given by the equation W i 
= log Pi -log . where p ,- is a probability that a pattern, matches an actual amino acid sequence 
at a fixed location, and q,: is a probability that said pattern, U, matches an amino acid trzinslation 
of a non-coding ORF; 

locating a match of said weighted patterns a pattern from said pattern database^ in said 
amino acid translation and assigning a coding quality measure for sziid ORF based on a sum of 
said weighted patterns which are located in said amino acid translation of said ORF ; 

identifying determining whether said ORF as including includes a putative gene if a value 
of said coding quality measure is greater than a predetermined threshold value based on a number 
of said matching patterns of amino acids located in said amino acid translation of said ORF, 
and/or a weighted value associated with a matching pattern located in said amino acid translation 
of said ORF ; and 

displaying a result of said identifying determining whether said ORF as including 
includes a putative gene. 

54. (Previously presented) The system according to claim 33, wherein said processor 
determines for each pattern in said pattern database whether the pattern is present in said amino 

acid translation by locating instances of said pattems in said amino acid translation, until a sum 
of weights corresponding to all patterns with matches in said amino acid translation exceeds a 
predetermined threshold, at which point said processor identifies said ORF as a putative gene. 



55. (Previously presented) The system according to claim 31, further comprising: 
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a parent database comprising a plurality of amino acid sequences, said patterns in said 
pattern database being derived from said plurality of amino acid sequences by using a pattern 
discovery algorithm; 

a memory device for storing data and instructions to be executed by sdid processor; and 
a display device for displaying an output from said processor. 

56. (Previously presented) The system according to claim 55, wherein said open reading 
frame (ORF) comprises a portion of said DNA sequence between a start codon and a stop codon, 

wherein said processor identifies a match of a pattern from said pattern database in said 
amino acid translation by using a predetermined pattern matching algorithm, 

wherein each pattern is assigned a weight depending upon a relevance of said pattern in 
determining whether said ORF comprises a putative gene, £ind 

wherein said ORF is reported as a putative gene when either a predetermined number of 
pattern matches is identified in said amino acid translation, or a sum of weights corresponding to 
all patterns with matches in said amino acid translation exceeds a predetermined threshold. 

57. (Previously presented) The system according to claim 31, wherein said processor accesses 
said pattern database to retrieve said patterns from said pattern database. 

58. (Previously presented) The system according to claim 31, wherein said processor is 
electrically coupled to said input device and said pattern database. 

59. (Currently amended) A system for identifying genes, comprising: 
an input device which inputs a genomic DNA sequence; and 

a processor which is configured to : 

access accesses a pattern database comprising a plurality of patterns of amino 

acids; 

translate translates an open reading frame (ORF) of said DNA sequence into an 
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amino acid translation; 

assign weights, w , . to said patterns of amino acids, said weights being given by the 
equation w; - log p ,- -log q ,- , where p,- is a probability that a pattern, ti, matches an actual amino 
acid sequence at a fixed location, and qi is a probability that said pattern, t,-, matches an zimino 
acid translation of a non-coding ORF; 

locate locates in said amino acid translation occurrences of said weighted patterns^ 
and assign a coding quality measure for said ORF based on a sum of said weighted patterns 
which are located in said amino acid translation of said ORF from said pattern database ; and 

identify determines whether said open reading frame as including includes a 
putative gene if a value of said coding quaUty measure is greater than a predetermined threshold 
value based on a number of said patterns of amino acids located in said amino acid translation of 
said ORF, and/or weighted values associated with said patterns of amino acids located in said 
amino acid translation of said ORF . 

60. (Currently amended) A system for identifying genes, comprising: 
an input device which inputs a query genomic DNA sequence; 
a processor which is configured to : 

access accesses a pattern database comprising a plurality of patterns of amino 

acids; 

translate translates on open reading frame (ORF) of said DNA sequence into an 
amino acid translation; 

assign weights, w,-, to said patterns of amino acids, said weights being given by the 
equation w ,- = log p j -log q ,- , where p , is a probability that a pattern, ti, matches an actual amino 
acid sequence at a fixed location, and qi is a probability that said pattern, ti, matches an amino 
acid translation of a non-coding ORF; 

locate locates in said amino acid translation occurrences of said weighted patterns^ 
and assign a coding quality measure for said ORF based on a sum of said weighted patterns 
which are located in said amino acid translation of said ORF from said pattern database ; and 
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identify determines whether said open reading frame as including includes a 
putative gene if a value of said coding quality measure is greater than a predetermined threshold 
value based on a number of said patterns of amino acids located in said amino acid translation of 
said ORF, and/or weighted values associated with said patterns of amino acids located in said 
amino acid translation of said ORF ; 

a display device for displaying an output of said processor, said output including an 
occurrence of said patterns in said amino acid translation, 

wherein said patterns comprises patterns derived using a Teiresias algorithm, 

wherein said open reading frame (ORF) comprises a portion of said DNA sequence 
between a start codon and a stop codon, and 

wherein said processor identifies a match of a pattern from said pattern database in said 
amino acid translation by using a predetermined pattern matching algorithm^ 

wherein each pattem is assigned a weight depending upon a relevance of said pattern in 
determining whether said ORF comprises a putative gene, and 

wherein said ORF is reported as a putative gene when either a predetermined number of 
pattem matches is identified in said amino acid translation, or a sum of weights corresponding to 
all patterns with matches in said amino acid translation exceeds a predetermined threshold . 



